Skip to content

feat: support v3 column default values in UpdateSchema (3/4)#793

Open
huan233usc wants to merge 4 commits into
apache:mainfrom
huan233usc:feat/default-values-evolution
Open

feat: support v3 column default values in UpdateSchema (3/4)#793
huan233usc wants to merge 4 commits into
apache:mainfrom
huan233usc:feat/default-values-evolution

Conversation

@huan233usc

@huan233usc huan233usc commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

What

Part 3 of 4 of Iceberg v3 column default-value support (POC #731), built on the
schema-layer support merged in #746. Independent of the read-path PRs (#792 and
the Avro follow-up).

Adds default-value handling to UpdateSchema (schema evolution).

Changes

  • AddColumn / AddRequiredColumn take an optional default_value. When
    provided it is set as both the column's initial-default and write-default.
    A non-null default also lets a required column be added without
    AllowIncompatibleChanges() — rows written before the change read the default
    instead of null.
  • UpdateColumnDefault(name, default) (new) sets, or clears with
    std::nullopt, a column's write-default; the initial-default is fixed when
    the column is added.
  • Defaults are cast to the column type (rejecting uncastable or out-of-range
    values) and preserved across rename / doc / type-promotion updates and
    nested field-id reassignment.
  • RequireColumn may now mark a column that was added with a default required.

The SchemaField constructor stores defaults verbatim (it does not coerce
them), so the cast/promotion is performed explicitly at each evolution site —
the same effect as Java, where NestedField's constructor runs castDefault.
Same-scale decimal precision widening is handled directly (the unscaled value is
unchanged), since Literal::CastTo does not cast between decimal types.

Tests

13 cases in update_schema_test.cc: add optional/required/nested column with a
default, mismatched/narrowing rejection, UpdateColumnDefault
(set / clear / cast-to-type / pre-existing column), require-after-default, and
preservation across doc updates and type promotion — including same-scale
decimal precision promotion.

Stack

  1. feat(schema): represent, serialize and validate v3 column default values (1/4) #746 — schema: represent / serialize / validate (merged)
  2. feat(parquet): apply column default values when reading missing fields (2/4) #792 — read path: Parquet
  3. read path: Avro (follows)
  4. this PR — schema evolution: addColumn / updateColumnDefault

@huan233usc huan233usc changed the title feat(evolution): support v3 column default values in UpdateSchema (4/4) feat(evolution): support v3 column default values in UpdateSchema (3/4) Jun 29, 2026
bool is_defaulted_add = false;
// A column added in this update with a default value can be made required: rows
// written before the change read the initial-default instead of null.
bool is_defaulted_add = added_name_to_id_.contains(CaseSensitivityAwareName(name)) &&

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For map/list nested adds, AddColumnInternal stores the new field in added_name_to_id_ using the canonical internal path:

  added_name_to_id_[CaseSensitivityAwareName(full_name)] = new_id;

where full_name becomes something like:

  locations.value.alt
  points.element.z

But RequireColumn receives and checks the user-facing shorthand path:

  added_name_to_id_.contains(CaseSensitivityAwareName(name))

where name is typically:

  locations.alt
  points.z

So the lookup misses, even though it refers to the same added column

@huan233usc huan233usc Jul 1, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this -- yes it stored only the canonical full_name but callers use the short name. Fixed by also indexing the short name (matching how the schema name index handles map value / list element paths), with a regression test for requiring a defaulted nested column by short name in the same transaction.

It seems that same bug exists in Java; working on a fix apache/iceberg#17034.

@huan233usc huan233usc force-pushed the feat/default-values-evolution branch from f51fb4a to d87a2b8 Compare July 1, 2026 19:39
@huan233usc huan233usc changed the title feat(evolution): support v3 column default values in UpdateSchema (3/4) feat: support v3 column default values in UpdateSchema (3/4) Jul 1, 2026
@huan233usc huan233usc force-pushed the feat/default-values-evolution branch from d87a2b8 to 1eb9681 Compare July 1, 2026 22:27
@huan233usc huan233usc requested a review from manuzhang July 1, 2026 22:30
Comment thread src/iceberg/update/update_schema.cc Outdated
*new_default);
ICEBERG_BUILDER_ASSIGN_OR_RETURN_WITH_ERROR(
Literal typed_default,
new_default->CastTo(internal::checked_pointer_cast<PrimitiveType>(field.type())),

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we share the decimal default promotion logic used above in UpdateColumn here too? Literal::CastTo only returns decimal values when source and target DecimalType are exactly equal; it does not handle same-scale precision widening. That means adding decimal(18,2) with Literal::Decimal(..., 9, 2) is rejected even though this is the same allowed promotion handled above. The same issue appears in AddColumnInternal.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — shared it into a CastDefaultToType helper used in all three spots, with tests. Thanks!

Comment thread src/iceberg/update/update_schema.cc Outdated
target_type->type_id() == TypeId::kDecimal) {
const auto& decimal_type = internal::checked_cast<const DecimalType&>(*target_type);
return Literal::Decimal(std::get<Decimal>(value.value()).value(),
decimal_type.precision(), decimal_type.scale());

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we guard this decimal fast path with a same-scale check before rebuilding the literal? Decimal stores only the unscaled value, so accepting decimal(9,3) as a default for decimal(18,2) would reinterpret 1234 from 1.234 to 12.34 instead of rejecting it. Java default parsing/serialization requires the decimal default scale to match the column type, so only same-scale precision widening should be accepted here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — added a same-scale guard so different-scale defaults fall through to CastTo and get rejected. Tested both paths. Thanks!

@huan233usc huan233usc requested review from manuzhang and wgtmac July 2, 2026 15:19
AddColumn / AddRequiredColumn now accept an optional default value, used as
both the initial-default and write-default of the new column; a non-null
default also lets a required column be added (or an added column be made
required) without AllowIncompatibleChanges(). UpdateColumnDefault sets or
clears the write-default. Defaults are cast to the column type (rejecting
uncastable or out-of-range values) and preserved across rename / doc / type
updates and nested field-id reassignment.

Part 4 of the v3 column-default-values work (POC apache#731), built on apache#746.
Extract the decimal same-scale precision-widening logic into a shared
CastDefaultToType helper and use it in UpdateColumn, UpdateColumnDefault, and
AddColumnInternal. Previously only UpdateColumn handled it, so setting or adding
a decimal default whose precision differs from the column (e.g. Decimal(_,9,2)
into a decimal(18,2) column) was wrongly rejected.
Guard the decimal default fast path in CastDefaultToType with a same-scale
check. Decimal stores only the unscaled value, so rebuilding a decimal(9,3)
default as decimal(18,2) would reinterpret 1234 from 1.234 to 12.34. Only
same-scale precision widening is accepted; a differing scale falls through to
CastTo and is rejected, matching Java's requirement that a decimal default's
scale match the column type.
@huan233usc huan233usc force-pushed the feat/default-values-evolution branch from 4ff0974 to 171e233 Compare July 2, 2026 18:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants